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Abstract 

Background: Identifying genes of adaptive significance in a changing environment is a major focus of ecological 
genomics. Such efforts were restricted, until recently, to researchers studying a small group of model organisms or 
closely related taxa. With the advent of next generation sequencing (NGS), genomes and transcriptomes of virtually 
any species are now available for studies of adaptive evolution. We experimentally manipulated temperature 
conditions for two groups of crimson spotted rainbowfish {Melanotaenia duboulayi) and measured differences in 
RNA transcription between them. This non-migratory species is found across a latitudinal thermal gradient in 
eastern Australia and is predicted to be negatively impacted by ongoing environmental and climatic change. 

Results: Using next generation RNA-seq technologies on an lllumina HiSeq2000 platform, we assembled a de novo 
transcriptome and tested for differential expression across the treatment groups. Quality of the assembly was high 
with a N50 length of 1856 bases. Of the 107,749 assembled contigs, we identified 4251 that were differentially 
expressed according to a consensus of four different mapping and significance testing approaches. Once duplicate 
isoforms were removed, we were able to annotate 614 up-regulated transfrags and 349 that showed reduced 
expression in the higher temperature group. 

Conclusions: Annotated blast matches reveal that differentially expressed genes correspond to critical metabolic 
pathways previously shown to be important for temperature tolerance in other fish species. Our results indicate 
that rainbowfish exhibit predictable plastic regulatory responses to temperature stress and the genes we identified 
provide excellent candidates for further investigations of population adaptation to increasing temperatures. 
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Background 

The ability of species and populations to adapt to environ- 
mental change is the cornerstone of the emerging field of 
ecological genomics [1,2]. Until recently, genome-wide 
studies of genetic adaptation in non-model organisms 
were not possible. With the advent of massively parallel 
next generation sequencing technologies (NGS), these 
types of studies have become a reality [3] and while 
many of the challenges and preferred strategies are still 
being addressed [4-6], empirical studies are now starting 
to be reported [7-14]. Studies of transcriptome level 
responses to environmental change offer an opportunity 
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to understand the underlying genetic basis for adaptation. 
Such studies represent a powerful approach to assessing 
the genes involved in adaptation to a changing climate, 
particularly increasing temperatures. By profiling tran- 
scriptional changes induced by temperature stress, it is 
possible to identify the gene regions or pathways that are 
likely to be the targets of selection. This information is 
crucial to enable researchers to assess levels of variation 
across these gene regions, at a landscape scale, to predict 
the capacity of organisms to adapt to a warming climate. 

Genes involved in physiological adaptation to tempera- 
ture stress have been uncovered in many species. Heat 
shock proteins [15], alcohol dehydrogenase [16] and 
lactate dehydrogenase genes [17] have all been shown to 
be related to heat tolerance. In fish, the list of candidates 
also includes many from other gene regions related to 
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respiration and protein binding [18-20]. Apart from differ- 
ences in coding regions, transcriptional regulation is also a 
source of variation that can potentially contribute to 
adaptive evolutionary change, particularly in the early 
stages of divergence. Studies in natural populations of 
gobies {Gillichthys mirabilis) have shown that short 
term exposure (<8 hours) to a temperature of 32°C 
induces a strong upregulation of heat shock proteins 
(Hsps) in both gill and muscle tissues [21]. Many other 
transcripts related to a wide variety of biological 
processes including protein homeostasis, cell cycle con- 
trol, cytoskeletal reorganisations, metabolic regulation, 
and signal transduction were differentially expressed in 
treatment and control groups. The majority of these 
genes displayed tissue-specific responses presumably re- 
lated to the differing molecular functions associated with 
each tissue type. Logan and Somero [22] found that, with 
long-term acclimation to increased temperature (up to 
28°C), there was no upregulation of stress-related pro- 
teins and only slight, although detectable, differences in 
expression of genes involved in protein biosynthesis, 
transport and various metabolic categories. This they 
suggest indicates evidence of long-term acclimation 
showing a steady state condition involving relative 
energy costs for different processes. They later showed 
however, that stress related genes {HSP70, UBIQ, and 
CDKN1B) were induced in long-term acclimatised fish 
subsequently exposed to acute heating conditions (4°C/ 
hour) and that the onset temperature for significant 
expression change varied according to acclimation 
temperature [23]. Quinn et al. [24] also found increased 
expression of HSPs and Ubiquitin in Arctic charr 
{Salvelinus alpinus) exposed to temperature stress and 
reported a down regulation of haemoglobin genes in 
fish that showed tolerance to increased temperatures. 
Another cold climate fish, Trematomus bernacchii, has 
been shown to be unable to mount a heat shock re- 
sponse despite retaining the heat shock gene Hsp70 and 
the regulation factor HSF1 [25]. Further work showed 
that many other genes associated with the cellular stress 
response were induced by heat stress. The inability to 
mount a heat shock response however, highlights the 
susceptibility of this species to global warming and 
raises the question as to how this and other species will 
be able to adapt to increasing temperatures. 

Buckley and Hofmann [26] examined the extensive 
plasticity in Hsp induction in gobies acclimatised to 
different thermal backgrounds (13°C, 21°C, and 28°C). 
They found that the activation temperature of the tran- 
scriptional regulator HSF1 was positively associated with 
the acclimatisation temperature indicating that plasticity 
in heat shock response is linked to plasticity in the regula- 
tory framework governing Hsps. While adaptive plasticity 
is often seen as a mechanism that can slow or dampen 



divergent selection, it has been argued that it can also 
lead to rapid speciation if there are strong correlations 
between phenotype and environment combined with 
significant population structure [27]. By examining the 
transcriptomic response to temperature stress we can 
develop a better understanding of the genes and biochem- 
ical pathways that are fundamental to physiological accli- 
matisation to a warming environment and gain insights 
into the regulatory changes that accompany adaptation 
over evolutionary timescales [28]. 

Australian rainbowfish are an ideal species group to test 
hypotheses about the genetic responses to increasing tem- 
peratures. In particular, the crimson-spotted rainbowfish 
(Melanotaenia duboulayi) is a subtropical freshwater spe- 
cies found along a north-south temperature gradient in 
eastern Australia. Their distribution ranges over several 
ecoregions which, coupled with a strong population struc- 
ture and local abundance [29-31], makes them a well 
suited model for studying local adaptation. The ease of 
maintaining captive populations of rainbowfish also make 
them amenable to a range of laboratory-based experimen- 
tal studies [32-34]. In this study, we maintained groups of 
M. duboulayi at ambient and elevated temperature levels 
and then used an RNA-seq approach to assess transcrip- 
tome level changes related to temperature stress. Our aim 
is to provide an initial investigation of the transcriptomic 
response to thermal stress in rainbowfish. As such, this 
will allow for the screening of many more individuals via 
genotyping of candidate SNPs. In addition we present the 
first annotated transcriptome and gene catalogue for the 
order Atheriniformes. Our goal is to identify key candidate 
genes and make a first step towards understanding the 
important biochemical pathways on which selection is 
likely to act in a warming climate. 

Methods 

Source of fish and design of temperature trial 

Crimson spotted rainbowfish were collected using a hand- 
net from a location in the upper reaches of the Brisbane 
River, near the township of Fernvale (27°26'37.39"S, 
152°40'12.76"E). Water monitoring data from the 
Queensland Department of Environment and Resource 
Monitoring (DERM) show the average daily mean tem- 
peratures for this location ranged between 12.2°C in 
winter and 28.3°C in summer from January 1 st 2004 to 
January 1 st 2011 (http://watermonitoring.derm.qld.gov.au). 
Fish were transported live to Flinders University animal 
rearing facility and acclimatised at a temperature of 21°C 
for a period of 30 days prior to the start of the 
temperature trials. For the trials we used only adult male 
rainbowfish of about the same length (a proxy for age), 
since gender and age can affect expression responses [35] . 
These individuals were randomly assigned to a treatment 
or a control group (n = 6 per group). Temperature in the 
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treatment group was increased by 2°C per day over a 
period of six days towards a target of 33°C This target 
represents the projected average summer temperature 
for this region in 2070 based on a high emission sce- 
nario of the International Panel on Climate Change: 
http://wwwxlimatechangeinaustralia.gov.au/qldtempl5. 
php. This temperature condition was then maintained 
for 14 days. The control group was kept at 21°C for the 
duration of the experiment. All animal handling was 
performed in accordance with the Australian Code of 
Practice for the Care and Use of Animals for Scientific 
Purposes, 2004 and approved by the Flinders University 
Animal Welfare Committee (AWC E342). 

RNA extraction, lllumina library preparation and 
sequencing 

Upon completion of the temperature trial, fish were 
sacrificed using AQUI-S® solution [36] and dissected im- 
mediately to remove their livers. Although increased 
temperature has been shown to differentially induce 
expression changes in different tissue types [21,37], we 
were restricted to examining just one tissue type due to 
logistical constraints. We selected liver due to previous 
research linking this tissue type to heat stress responses 
[38-40]. Total RNAs were individually extracted using 
the Ambion Magmax™-96 total RNA isolation kit (Life 
Sciences) according to the manufacturers instructions. 
Briefly, 5 mg of tissue was placed in the lysis solution 
and homogenised in Qiagen Tissuelyzer™ for a period of 
30 sec. Nucleic acids were captured onto magnetic 
beads, washed and treated with DNase. Total RNA was 
then eluted in 50 \A elution buffer. RNA quality and 
concentration was measured using an RNA Pico chip on 
an Agilent Bioanalyzer. Normalised starting quantities of 
total RNA were then used to prepare 12 separate 
lllumina sequencing libraries with the TruSeq™ RNA 
sample preparation kit (lllumina). Library preparation 
was performed as per the manufacturer s instructions. In 
the final step before sequencing, all 12 individual libraries 
were normalised and pooled together using the adapter 
indices supplied by the manufacturer (lllumina MID 
tags 2, 4-7, 12-16, 18, 19). Pooled sequencing was then 
performed as 101 bp, paired-end reads in a single lane 
of an lllumina HiSeq2000 instrument housed at the 
Ramaciotti Centre for Gene Function Analysis, University 
of New South Wales. 

Quality control and de novo assembly 

Sequence data were sorted by individual and adapters 
were trimmed by the service provider prior to analysis. 
Quality filtering was performed using the FastX-toolkit 
suite of pre-processing tools (http://hannonlab.cshl.edu/ 
fastx_toolkit/index.html) in a Galaxy setting [41]. Based 
on the FastX quality statistics, the first two and last 5 



bases were trimmed from each read as they had consist- 
ently low phred scores (<Q15). Paired reads were then 
joined and a quality filter applied such that any combined 
reads having <90% of bases with a phred score of Q20 or 
higher were discarded. Paired reads were then split and in- 
terleaved to suit the input style of the de novo assembly 
program. Transcriptome assembly was performed de novo 
with the program Velvet/Oases [42]. This program recon- 
structs independent assemblies based on different k-mer 
values used to build a de Bruijn graph. The program then 
uses dynamic error removal adapted to RNA-seq data and 
implements a robust scaffolding method to predict full 
length transfrags. Multiple single k-mer assemblies are 
then merged to cover genes at different expression levels 
without redundancy. Two individuals from each of the 
treatment and control groups were pooled as input for the 
assembly. Assemblies were compiled for a k-mer range of 
19 to 49 with an expected insert size between paired ends 
of 300 bp and a coverage cut-off value set to 4.2. We 
tested different merged assembly ranges based on the 
summary statistics for each individual k-mer assembly 
[43]. The outcome of each merge was assessed with re- 
spect to the optimal assembly parameters [4]. The optimal 
assembly should achieve the best balance between large 
median, mean and N50 contig lengths while minimising 
the total number of contigs but maintaining a large 
summed contig length. As Oases is vulnerable to mis- 
assembly at low k-mer values, we adopted a conservative 
approach of merging k-mer values > k = 19. Optimal 
assembly was achieved with a k-mer range of 19 to 41. 

Mapping of sequence reads and differential expression 
analysis 

To test for differential expression (DE), individual se- 
quence reads for each sample were mapped back to the 
assembled transcriptome with the alignment program 
Bowtie [44]. Bowtie was implemented in the -v alignment 
mode with the maximum number of mismatches set to 3. 
Paired end reads were aligned to the transcriptome with 
both read pairs needing a valid alignment within a given 
locus to be counted as a match. If more than one align- 
ment was possible the best match was reported according 
to the least number of mismatches for each read and 
overall for the pair. The reproducibility of the alignment 
approach was tested by performing the mapping step 
with BWA, an alternative alignment program [45]. The 
number of reads aligning to each transfrag for each 
sample was calculated with the IdxStats command of 
Samtools [46]. Count data was then used as input for 
the program DESeq [47] which estimates variance-mean 
dependence in the data and tests for differential expres- 
sion based on the negative binomial distribution. The six 
samples from each treatment were used to generate mean 
expression levels with associated variances. Differential 
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expression was tested at a significance level of a= 0.05 
adjusted to match a 5% false discovery rate using the 
Benjamini-Hochberg procedure. The threshold for fold- 
change differences is determined by the significance 
testing as the power to detect significant differential 
expression depends on the expression strength. For 
weakly expressed genes, stronger changes are required 
for the gene to be called significantly expressed. We also 
compared DE methodology by running the EdgeR pro- 
gram to assess significant differences in the count data. 
A consensus list of DE genes was then generated from 
the four analysis approaches adopted (i.e. Bowtie-DESeq, 
Bowtie-EdgeR, BWA-DESeq, BWA-EdgeR). Significantly 
up and down regulated transfrags were selected and 
blasted against the NCBI database using blastx in the 
program Blast2GO [48]. Blastx was performed against the 
NCBI nucleotide database with the minimum E-value 
score set to 1.0E-06. To assign gene ontology terms to 
each annotated sequence, successful blast hits were 
mapped and annotated using Blast2GO for the entire 
assembled transcriptome with the annotation cut-off 
threshold set to 55 and the GO level weighting set to 5. 

Results and discussion 

Raw sequencing data and quality statistics 

The single lane of Illumina HiSeq2000 produced close to 
128 million paired-end reads (2 x 101 bp). After trim- 
ming and quality filtering, 12.3% of reads were discarded 
leaving over 224 million reads for downstream analysis 
(2 x 94 bp). The final number of reads per individual 
ranged from 11.7 million to 29 million (mean = 18.7 
million ± 1.4 million). The number of reads in each 
treatment group was well balanced with 112.3 million in 
the 21°C group and 112.0 million in the 33°C group 
(Additional file 1: Table SI). We selected the best k-mer 
merge range for assembly based on the distribution of 
assembly statistics for the individual k-mer assemblies 
from k = 19 to k = 49 (see Table 1). The merged assem- 
bly from a k-mer range of 21 to 39 scored best on the 
balance of these parameters with a N50 value of 1,856 
and a total number of contigs of 107,749. While this 
range may exclude some rare, low-abundant transcripts, 
it presents a more conservative and reliable approach to 
differential expression testing by emphasising the accur- 
acy of the assembly rather than the identification of low- 
abundant transcripts from both treatments. Annotation 
of the transfrags with the Blast2Go software suite 
resulted in 65,105 (60.4%) blast hits and 53,278 (49.4%) 
successfully annotated sequences. 

Differential expression analyses 

The four different combinations of mapping and DE test- 
ing produced vastly different numbers of DE transfrags 
(see Table 1, Figure 1). The combination of BWA 



alignment followed by EdgeR DE analysis identified the 
most with 14,076 DE transfrags, whereas Bowtie followed 
by DESeq identified the least with 5,577 (Figure 1). The 
difference between the approaches likely arises from the 
different characteristics of the two aligners combined with 
the sensitivities of the DE tests. Bowtie does not allow 
gapped alignments and makes use of the base quality 
scores [49], making it more conservative than BWA in the 
number of mapped reads. On the other hand DESeq has 
also been shown to be more conservative than EdgeR 
when identifying DE genes from low count data [50] 
which likely explains the lower number of hits in multi- 
plex sequencing strategies such as ours. The total num- 
ber of DE transfrags identified by all four approaches 
was 4,251 (Figure 2). We adopted a conservative ap- 
proach and selected only these transfrags to blast 
against the reference database. Future RNA-seq studies 
should assess their priorities for DE gene discovery and 
select the detection strategy based upon the need for 
identifying lowly expressed genes versus the accuracy 
expected given the number of replicates used [51]. Robles 
et al [50] showed that EdgeR could be used to detect 
higher numbers of DE transfrags from low count data 
without compromising accuracy when the number of bio- 
logical replicates was at least six in each treatment group. 

The Blast2GO program was able to find sequence 
similarities for 2,740 of the DE transfrags but could not 
find mapping or annotation information for a further 
634 of them, leaving 2,106 DE transfrags which were 
successfully annotated. The top 15 matching species 
from the BLAST query were all fish species with the 
most BLAST hits being for the Nile tilapia Oreochromis 
niloticus with 583 matches. Duplicate gene isoforms 
were detected by matching identical annotated gene 
names from the Blast2GO output. These isoforms were 
then combined and reported as single "genes". Once 
isoforms were combined, there were 614 genes that were 
up-regulated in the high temperature treatment with 
349 genes being down-regulated (see Additinal file 1: 
Table S2a and b). For significantly down- regulated 
transfrags, the mean fold-change between ambient and 
high-temperature conditions was 4.0-fold, with a range 
from 55.6-fold for g2/m phase specific e3 ubiquitin- 
protein ligase to 2.2-fold for the Phytanoyl-peroxisomal- 
like protein. The mean fold-change for significantly 
up-regulated transfrags was 11.13, ranging from 1.98 
(for the cyclin-dependent kinase 2 interacting protein) 
to 259-fold (for the Heat shock protein Hsp-90-like). 

Ontology of differentially expressed genes 

Many functional classes of genes were affected by 
temperature stress. As expected, heat shock protein 
genes including HSPA4 (12.3 x), Hsp60 (6.6 x), Hsp70 
(9.9 x) and Hsp90a (141.3 x) were significantly up- 



Table 1 Assembly statistics for k-mer lengths 19-49 and different k-mer merge ranges from the Oases de novo assembly program 





k19 


k21 


k23 


k25 


k27 


k29 


k31 


k33 


k35 


k37 


Total sequences 


1.2E+05 


7.3E+04 


6.2E+04 


5.5E+04 


5.2E+04 


5.0E+04 


4.8E+04 


7.0E+04 


8.2E+04 


8.1E+04 


Total bases 


6.7E+07 


6.1E+07 


5.6E+07 


5.3E+07 


5.1E+07 


5.0E+07 


4.9E+07 


6.2E+07 


7.2E+07 


7.3E+07 


Min sequence length 


7.1E+01 


1 .OE+02 


8.1E+01 


1 .OE+02 


9.8E+01 


1 .OE+02 


9.0E+01 


1 .OE+02 


1 .OE+02 


1 .OE+02 


Max sequence length 


1.5E+04 


1 .7E+04 


2.0E+04 


1 .8E+04 


2.1E+04 


2.3E+04 


1 .8E+04 


1 .2E+04 


1.3E+04 


1.3E+04 


Average sequence length 


558.04 


837.19 


906.27 


960.63 


979.44 


991.21 


1010.47 


884.15 


888.33 


901.83 


Median sequence length 


356 


527 


546 


580 


584 


590 


605 


595 


583 


584 


N50 length 


873 


1397 


1585 


1686 


1746 


1759 


1801 


1398 


1460 


1493 


(A + ^s 


55.25% 


55.32% 


55.25% 


55.16% 


55.16% 


55.17% 


55.12% 


55.07% 


55.21% 


55.35% 


(G + C)s 


43.99% 


44.27% 


44.47% 


44.60% 


44.64% 


44.67% 


44.72% 


44.83% 


44.63% 


44.49% 


Ns 


0.77% 


0.41% 


0.28% 


0.23% 


0.21% 


0.16% 


0.16% 


0.10% 


0.16% 


0.16% 



Table 1 Assembly statistics for k-mer lengths 19-49 and different k-mer merge ranges from the Oases de novo assembly program (Continued) 



k39 k41 k43 k45 k47 k49 k19_39 k21_39 k25_47 k21_49 



Total sequences 


8.2E+04 


4.3E+04 


4.1E+04 


4.0E+04 


3.9E+04 


5.6E+04 


4.5E+05 


1.1E+05 


4.0E+05 


4.9E+05 


Total bases 


7.2E+07 


4.4E+07 


4.3E+07 


4.2E+07 


4.1E+07 


5.1E+07 


4.8E+08 


1.3E+08 


4.4E+08 


5.5E+08 


Min sequence length 


1 .OE+02 


9.9E+01 


1 .OE+02 


1 .OE+02 


1 .OE+02 


1 .OE+02 


1 .OE+02 


1 .OE+02 


1 .OE+02 


1 .OE+02 


Max sequence length 


1 .4E+04 


1 .7E+04 


1 .7E+04 


1 .7E+04 


1 .7E+04 


1 .4E+04 


2.3E+04 


1 .7E+04 


2.3E+04 


2.3E+04 


Average sequence length 


884.62 


1026.19 


1037.21 


1042.06 


1049.7 


903.57 


1071.74 


1245.3 


1114.01 


1124.62 


Median sequence length 


568 


624 


634 


639 


650 


608 


780 


930 


805 


818 


N50 length 


1492 


1785 


1789 


1795 


1786 


1457 


1589 


1856 


1671 


1689 


(A + ^s 


55.38% 


55.04% 


55.03% 


55.05% 


55.03% 


54.85% 


55.95% 


55.11% 


55.69% 


56.13% 


(G + Qs 


44.48% 


44.86% 


44.87% 


44.87% 


44.89% 


45.08% 


44.05% 


44.89% 


44.31% 


43.87% 


Ns 


0.14% 


0.10% 


0.10% 


0.08% 


0.08% 


0.06% 


0.00% 


0.00% 


0.00% 


0.00% 
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Figure 1 Overlap between the number of differentially 
expressed transfrags detected from the four combinations of 
mapping and significance testing methods for sequences 
involved in transcriptomic response to increased temperature 
for the rainbowfish Melanotaenia duboulayi. See text for details 
of mapping and testing methods used. 



regulated in heat stressed fish. These transcripts are well 
characterised as stress inducible and have been shown, 
in many species, to be involved in protection against 
apoptosis or as a molecular chaperone under extended 
exposure to heat stress [15,19,20,52-56]. Further to these 
well-characterised stress related genes, the gene ontology 
analysis also identified transcripts involved in catabolism 
(11% of annotated sequences) and lipid metabolism (12% 
of annotated sequences) as being the important biological 
processes in the response to temperature stress (Figure 3a). 
As with other studies in fish, regulation of metabolic 
processes are clearly important parts of the heat stress 
response [21,22,24]. A large proportion of the individual 
over-expressed genes in rainbowfish were related to 
oxidoreductase activity, mitochondrial components and 
organelle membranes. These gene categories are typic- 
ally associated with increased metabolism, particularly 




1e-01 1e+01 1e+03 1e+05 

Mean expression across all samples 

Figure 2 Differential expression of 107,749 transfrags 
assembled for the rainbowfish Melanotaenia duboulayi under 
different temperature treatments (21 °C vs. 33°C). Results are 
shown as the log 2 fold change in expression versus the mean 
expression level between treatment groups. Red dots above zero 
fold change represent significantly up-regulated transfrags whereas 
red dots below zero fold change represent significantly down- 
regulated tranfrags at the 0.5 false discovery rate. 



to cope with increased temperature and the related hyp- 
oxic conditions. Additionally we found a role for genes 
of the ubiquitin family and the gene 78 kDa glucose- 
regulated protein precursor which, similar to Quinn 
et al. [57], were upregulated in response to heat stress. 
Gene ontology analysis also identified biomolecular 
binding and catalytic activity as the major molecular 
functions affected by exposure to different temperature 
regimes (see Figure 3b). Within these broad categories, 
protein binding and ATP binding were the major bio- 
molecular binding functions affected by differentially 
expressed transfrags with node scores of 244 and 226 
respectively. For catalytic activity, transferase activity 
(nodescore = 53) and oxidoreductase activity were prom- 
inent (node score = 54). These functional categories, com- 
bined with electron carrier activity (node score = 63), is 
congruent with the expected role of aerobic respiration in 
response to the increased temperature. 

While the Hsp genes are commonly identified as 
overexpressed in short-term temperature manipulation 
experiments [24,37], they are less likely to be targets for 
selection during gradual temperature shifts associated 
with climate change [22,53]. Hsp genes represent a 
physiological response to sudden stressors and therefore 
plasticity in these traits is unlikely to be adaptive over 
longer timescales [58]. The more likely candidates for an 
adaptive genetic response are those genes involved in what 
has been termed the "cellular homeostatis response" to 
long-term temperature stress [59]. Unlike stress response 
genes that provide an immediate early response to macro- 
molecular damage and sudden changes in cellular redox 
potential, the cellular homeostatasis response involves 
effector proteins mediating parameter specific adaptation 
to environmental change. 

Responses associated with prolonged exposure to heat 
stress 

Prolonged exposure to increased temperatures has previ- 
ously been associated with gene ontologies related to pro- 
tein folding, oxidative stress and immune function [18,19]. 
Similarly, we detected significant upregulation of genes 
with these ontologies in the high temperature treatment 
such as Calnexin (2.8 x), NADH dehydrogenase (2.5 x), 
and glutathione S-transferase (5.1 x) suggesting long-term 
reallocation of energy resources. Plasticity in the expres- 
sion of these genes is more likely to be adaptive and allow 
localised populations to survive in a changing environ- 
ment, eventually leading to divergent selection. Kassahn 
et al. [53] grouped stress-response transcripts into four 
different clusters according to the pattern of regulation 
detected under short versus long-term exposure to heat 
stress. They suggested that long-term exposure to heat 
stress in a coral reef fish (31°C for five days) induces ex- 
pression of genes involved in development and immune 
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(b) Molecular Function 



Molecular 




(c) Cellular Component 




Figure 3 Distribution of annotated transfrags assigned to (a) biological processes or (b) molecular functions or (c) the cellular 
components according to gene ontology association. Analysis carried out with the Blast2Go program for sequences involved in 
transcriptomic response to increased temperature for the rainbowfish Melanotaenia duboulayi. 

J 



Smith et al. BMC Genomics 2013, 14:375 
http://www.biomedcentral.com/1471 -21 64/1 4/375 



Page 9 of 12 



Table 2 Candidate genes for broad scale studies of temperature response in the crimson spotted rainbowfish, 
Melanotaenia duboulayi 
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Table 2 Candidate genes for broad scale studies of temperature response in the crimson spotted rainbowfish, 
Melanotaenia duboulayi (Continued) 

cytochrome p450 1 a 5.02E-152 -3.70 Oendoplasmic reticulum membrane 

thyrotrophic embryonic factor 9.16E-157 -3.85 P:cellular response to light stimulus 

nuclear receptor subfamily 1 group d member 2 3.54E-102 -4.17 P:steroid hormone mediated signaling pathway 

vitellogenin ab 0 -10.0 FJipid transporter activity 

Genes correspond to transfrags with mid-range differential expression values that match metabolic, developmental, or immune response processes from gene 
ontology assignment by the Blast2Go program. Gene ontology abbreviations: P= biological process, F= molecular function, C= cellular component. 



function whereas genes related to metabolic function 
are suppressed. Our data, from long-term exposure to 
heat stress in rainbowfish (33°C for 14 days), support 
those findings. Developmental processes and metabolic 
processes accounted for 48% of dysregulated transfrags 
(Figure 3a). Immune function seems less important in 
our dataset and is covered by the "response to stimuli" 
category representing 9% of DE transfrags including the 
natural killer cell enhancement factor (upregulated 2.8 x). 
It is possible that the longer exposure to heat stress in our 
study allowed recovery from the immediate activation of 
the immune function genes. 

Under simulated models of divergence with plasticity, 
selection is possible when plasticity is moderate, dispersal 
ability is low and there are no fitness costs to plasticity 
[60]. It may therefore be worthwhile to focus attention on 
those gene regions that showed mid-range shifts in 
expression level in the treatment group when looking for 
evidence of adaptive evolution. In particular, the mid- 
range transfrags related to metabolic and developmental 
processes as well as immune function are likely to be good 
candidates as genes of adaptive significance for increasing 
temperatures (Table 2). Rainbowfish represent ideal candi- 
dates for studies of local adaptation due to their reduced 
dispersal and distribution over multiple ecoregions. The 
suite of genes that we present here showing a response to 
increased temperature are a good starting point for further 
manipulative experiments or landscape wide surveys of 
genetic variation. Creating a catalogue of polymorphisms 
at these genes throughout the range of M. duboulayi will 
provide insights into the adaptive potential of this species 
in the face of a warming environment. 

RNA-seq recommendations for non-model taxa 

The results of this study highlight the appropriateness of 
an RNA-seq approach for studies of adaptation (including 
adaptive plasticity) in non-model organisms. With the 
paucity of genomic resources available for most wildlife 
species, NGS technologies offer the best hope for unravel- 
ling the processes of evolutionary adaptation in a natural 
setting. Rainbowfish are evolutionarily very different from 
their nearest genome-enabled species, Oryzias latipes, yet 
in this study we were able to generate a substantial list of 
candidate genes involved in a response to increasing tem- 
peratures. Over the past few years, the proliferation of 



software resources and validated pipelines for RNA-seq 
means that virtually any organism can now be the focus of 
ecological genomic research and this is reflected in the 
rapid increase in publications reporting RNA-seq analyses 
in non-human taxa. The limiting factors that remain now 
are bioinformatic expertise and incomplete reference data. 
Over half of the dysregulated transfrags identified in our 
study were unable to be identified or were of unknown 
function. This continues to be a major challenge for stud- 
ies of ecological and evolutionary genomics [6]. Interpret- 
ation of genomic data lags well behind the current ability 
to generate that data. The limitation stems from the fact 
that annotation of genes of ecological interest still relies 
upon inferring homologies with genomic features 
established and developed in a few model species for 
non-ecological purposes. Better data integration is 
needed to facilitate the association of gene transcripts 
with specific natural conditions or phenotypic re- 
sponses. Further work to characterise the function of 
these unknown genes via experimental studies of non- 
model organisms will enhance our understanding of the 
important biological pathways involved in responses to 
temperature stress and other environmental changes. 
We have shown that differing mapping and DE analysis 
approaches lead to very different outcomes in terms of 
the DE genes identified. While a combination of all 
available approaches is preferable to identify overlap in 
the candidate genes detected, we found that combining 
output from just Bowtie mapping and DESeq signifi- 
cance testing with BWA mapping and DESeq signifi- 
cance testing delivered just 21 more DE genes than 
combining all four approaches tested in our study (see 
Figure 1). This conservative approach is an efficient way 
to avoid large numbers of false positives being detected 
in RNA-seq studies. 

Conclusions 

Temperature increases predicted over the coming de- 
cades suggests species with limited dispersal abilities will 
need substantial adaptive potential to avoid extinction. 
That adaptive potential will likely come from a number 
of sources including adaptive phenotypic plasticity, 
standing genetic variation, and newly-derived mutations. 
Regardless of the source, adaptation will be most im- 
portant in those processes related to heat tolerance. We 
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have presented a first insight into which processes are 
likely to be important in the rainbowfish, M. duboulayi. 
This provides a foundation for future research into 
temperature-driven adaptive responses in freshwater 
species but also invites more detailed study of the 
phenome-genome interaction under conditions of 
temperature stress. 

We identified a predictable suite of heat shock genes 
that responded sharply to increased temperatures in the 
treatment group. However, we also identified transfrags 
related to regulation of metabolic functions and develop- 
mental processes that showed mid-range levels of 
dysregulation and may be stronger candidates as genes 
for long-term adaptation to a warming environment. We 
present these candidate genes as targets for ongoing re- 
search into populations representing different thermal 
environments throughout the species range. We also ex- 
pect that these candidates will be useful targets for stud- 
ies of other freshwater species experiencing long-term 
thermal challenges. 

The expression level changes we have presented may 
be an example of a plastic response. To check for an 
adaptive component it is necessary to repeat the 
temperature trial on other geographically distant popula- 
tions and/or sister taxa. Parallel expression level changes 
in these populations would indicate plasticity whereas al- 
tered responses would be suggestive of adaptation at the 
genome level. Such "common garden" experiments allow 
the disentangling of pure plastic vs. genetic responses 
and are ideal approaches for future research. Other ave- 
nues to explore evolutionary adaptation to increased 
temperatures include investigating if DNA polymor- 
phisms are present within and between populations at 
the gene regions we have identified in this study. Exten- 
sions of this research to include adaptive traits from 
other important environmental impacts will enable a 
much broader understanding of how freshwater species 
are likely to cope with human-induced habitat and cli- 
matic change. 

Availability of supporting data 

Raw sequencing data is available through the NCBI Se- 
quence Read Archive under Project ID PRJNA205235 
(http://trace.ncbi.nlm.nih.gov/Traces/sra/). All samples 
were sequenced as 101 bp paired-end reads on an 
Illumina HiSeq2000 sequencer. 

Additional file 



Additional file 1: Table SI. Sequencing statistics for individual paired 
end reads from the pooled RNA-Seq library from M duboulayi sequenced 
in a single lane of the Illumina HiSeq 2000. Table S2a. Annotated genes 
matching up-regulated transfrags in the high temperature group of 
M. duboulayi. Mean similarity is computed as the average similarity value 



for all the hits of a given sequence. Gene ontology abbreviations: 
P= biological process, F= molecular function, C= cellular component. 
Table S2b. Annotated genes matching down-regulated transfrags in the 
high temperature group of M. duboulayi. Mean similarity is computed as 
the average similarity value for all the hits of a given sequence. 
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